Skip to content

Fix(databricks): Get correct datatypes from information_schema table#5783

Open
MisterWheatley wants to merge 1 commit intoSQLMesh:mainfrom
MisterWheatley:fix_databricks_column_types
Open

Fix(databricks): Get correct datatypes from information_schema table#5783
MisterWheatley wants to merge 1 commit intoSQLMesh:mainfrom
MisterWheatley:fix_databricks_column_types

Conversation

@MisterWheatley
Copy link
Copy Markdown
Contributor

@MisterWheatley MisterWheatley commented May 1, 2026

Description

Updated DatabricksEngineAdapter to query system.information_schema.columns (documented here) over running DESCRIBE <table_fqn> as there are problems particularly with respect to long and nested struct columns (see issue: #5781).

Test Plan

A manual test with a script was created on a sample table. Minimal script (assuming that adapter and connection is set up elsewhere and passed as argument)

def get_dbx_columns(
    adapter: EngineAdapter, table: str, schema: str, catalog: str
) -> t.Optional[dict[str, exp.DataType]]:
    info_schema = exp.to_table("system.information_schema.columns")
    query = (
        exp.select("column_name", "full_data_type")
        .from_(info_schema)
        .where(
            exp.and_(
                exp.column("table_catalog", table="columns").eq(catalog),
                exp.column("table_schema", table="columns").eq(schema),
                exp.column("table_name", table="columns").eq(table),
            )
        )
        .order_by("ordinal_position ASC")
    )

    result = adapter.fetchall(query)
    parsed_columns = {
        row.column_name: exp.DataType.build(row.full_data_type, dialect="databricks")
        for row in result
    }
    return parsed_columns

Output was then manually checked over to correspond with definition from UI

NB: When running make fast-test some mssql and fabric tests were observed to be failing with a cannot open shared object file error. I assume these are unrelated to changes to Databricks adapter.

In addition tests/core/test_context.py::test_wildcard failed with a SQLMesh project config could not be found error again assumed to be unrelated to this change.

Additional tests were not added as the signature of the function itself didn't change.

Checklist

  • I have run make style and fixed any issues
  • I have added tests for my changes (if applicable)
  • All existing tests pass (make fast-test)
  • My commits are signed off (git commit -s) per the DCO

…s for columns

Signed-off-by: Bjarke Enkelund <47357343+MisterWheatley@users.noreply.github.com>
@MisterWheatley MisterWheatley changed the title Fix(databricks): Get correct datatypes for columns in all cases Fix(databricks): Get correct datatypes from information_schema table May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant